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Abstract 

In this paper we revisit Talagrand's proof of concentration inequality for empirical 
processes. We give a different shorter proof of the main technical lemma that garantees 
the existence of a certain kernel. Our proof provides the almost optimal value of the 
constant involved in the statement of this lemma. 



1 Introduction and the proof of main lemma. 

This paper was motivated by the Section 4 of the "New concentration inequalities in product 
spaces" by Michel Talagrand. For the most part we will keep the same notations with possible 
minor changes. We slightly weaken the definition of the distance m(A, x) below compared 
to [TU], but, essentially, this is what is used in the proof of the concentration inequality for 
empirical processes. The Theorem 1 below is to Theorem 4.2 in ^U] and we assume that the 
reader is familiar with the proof. The main technical step, Proposition 4.2 in ^H], is proved 
differently and constitutes the statement of Lemma 1 below. 

Let fl n be a measurable product space with a product measure [i n . Consider a probability 
measure v on Q n and x G Q n . If Cj = {y G Q n : yi ^ a^}, we consider the image of the 
restriction of \i to Cj by the map y — > y iy and it's Radon-Nikodym derivative di with respect 
to jji,. As in ^U] we assume that Q is finite and each point is measurable with a positive 
measure. Let m be a number of atoms in Q and Pi, ■ ■ ■ ,p m be their probabilitites. By the 
definition of di we have 

/ g{Vi)dyL{y) = / g{yi)di{yi)dv{yi). 
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Consider the function 

s J x 2 /4, when x < 2, 
1 a; — 1, when x > 2. 



We set 



»(„,*)= and m( Ax) = infW,,x):^) = l}. 



Theorem 1 Let L > 1.12. Then for any n an A C Q n we have 

J exp^m(A,x)dP(x) < (1.1) 



As we mentioned above the proof is identical to [10] where Proposition 4.2 is substituted 
by the following lemma. 

Lemma 1 Let gi > g 2 > . . . > g m > 0. For L > 1.12 there exist {k* : 1 < j < i < m} such 
that 



k}>0 } X>^<1 (1-2) 



j<i 



and 



E^lEh^ + ^W £ + '+„ „ • (L3) 



Remark: This lemma does not hold for L < 1.07 (it is easy to construct the counterexample 
for m = 2), which means that L = 1.12 is close to the optimal. 

Proof: The proof is by induction on the number of atoms m. The statement of lemma is 
trivial for m = 1. Note that in order to show the existence of in the statement of lemma 
one should try to minimize the left side of ()1.3|) with respect to under the constraints 
(jl.2J) . Note also that each term on the left side of ()1.3|) has its own set of fcj, j < i and, 
therefore, minimization can be performed for each term separately. We assume that k l - are 
chosen in an optimal way minimizing the left side of (|1.3|) and it will be convenient to take 
among all such optimal choices the one maximizing Ylj<i kjPj f° r a ^ i < m. To make the 
induction step we will start by proving the following statement, where we assume that k* 
correspond to the specific optimal choice indicated above. 

Statement. For all i < m, we have 

k)pj < 1 log — < - and ^ 2L lo S ~Pj < L ( 1A ) 

In this case k\ = 2L log — . 

Proof: Let us fix i throughout the proof of the statement. We first assume that the left 
side of (jl.4j) holds. Suppose that log — > ^. In this case, since sup{^'(a;) : x G 1Z} < 1, one 
would decrease the left side of (jl.3|) by increasing k\ until J2j<i k)Vj — 1 which contradicts the 
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choice of fcj. On the other hand, log — < 7; implies that kj < 2, since for k > 2, ip{k) — k — 1 
and the choice of k l - would only increase the left side of (jl.3j) . For k < 2, ip(k) = k 2 /4 and 

argmin ( k log — H ) = 2L log — . 

\ g-j ALJ flr< 

Hence, if Ylj<i log > 1 then since Ylj<i ^jPj < 1 the se ^ 

J:= {j:A:}<2Llog^}^0 

is not empty. But again this would imply that Ylj<i^)Pj = M otherwise, increasing kj for 
j G J would decrease the left side of (jl.3|) . This completes the prove of the statement. 

□ 

(jl.4j) implies that if $^,- <f kjPj < 1 then J2j<i kjPj < 1, for Z < i. Therefore, the equality 
X]j< ? n-i Pj = ^ wou ^ imply J2j<m tifPj = 1- Let us first consider the case when 
Xlj<m-i Pj = 1- This step is meaningless for m = 2 and should simply be skipped. We 
will now show that k™ = kj 1-1 , j < m — 1 and fc^-i = 0- Indeed, 




Since X^<m-i ^j™ Vj = 1 it is advantageous to set k™_ t = and k™ = kj 1 , j < m — 1. In 
this case 




By induction assumption ()1.3|) holds for the sets (gi, . . . , g m -i) and (pi, . . . ,p m -i + Pm)- 
Since p TO -iflW-i + Pm^m < + Pm)g m -i, it is clear that it holds for (g u ...,g m ) and 

(pi, . . .,p m ). 

Now we will assume that Y^j< m -i ^T^Pi < ^ or ' equivalent ly, log j 23 ^- < j; and 
2Llog— ^— < 1. It is obvious that in this case there exist g < g m -i such 
that for g m G (<7oj <7m-i] both log j- < ^ and 2Llog -^-pj < 1 hold and, therefore, 

J2j< m ^TP^ < 1" ^ e ass ume that go is the smallest number with such properties. Let us 
show that for a fixed gi, . . . , g m -i the case of g m < go can be converted to g m = go. Indeed, 
take g m < go. Clearly, Y2j< m ^TP3 = ^ m ^ n is case (II. 5j) still holds and implies that kj 1 do 
not depend on g m for g m < g . We have 

£exp{£(log |*f + - £ «p{£ AJ- + i* W ) B }, 

which means that for g m < g the left side of the inequality ()1.3|) does not depend on g m . 
Since (pi<7i + . . . + Pm^m) -1 decreases in g m it's enough to prove the inequality for g m = g . 
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Hence, we can finally assume that log j- < j, J2j <m log j~Pi < 1 an d kj = 2L log j-. 
(jl.3|) can be rewritten as 

^exp{-L£(log*) 2 p ,} < -J- . (1.6) 

i<m j<i 

It is easy to see that by induction hypothesis (J1.6J) holds for g m = g m -i- To prove it for 
9m < <?m-i we will compare the derivatives of both sides of (jl.6J) with respect to g m . It is 
enough to have 

Pm f T \ 11 / , 9m \ 2 \ ( 1 nT \ 11 , 9m 1 \ ^ Pr, 



exp{-L V(log — ) pj}( — — -2LVlog — pj — ) > -- 

or, equivalent ly 



Since 1 — x < e~ x for a; > it's enough to show 

exp{-L£ Pj ((log *) 2 + 21og *)} < (— J ^T )' 

]<m 



One can check that (logs) 2 + 2 log a; is concave for x > 1. If we express gj = Xjgi + (1 — 
Aj)ftn, j = 1, ■ ■ ■ , m - 1, then 



5>((k*f) +2'o g f)2EfA)((logf) +2,ogf 

i<m y™ / y™ 7 i<m 0m' yn„ 

p 1 g 1 + ... +p m g m = (J2pj\j)gi + (p m + J^(l - Xj)pj)g m . 

j<m j<m 

If we denote p = Y\^™ Pi^i anc l t = log — we have to prove 

exp{-Lp(t 2 + 2t)} < ( )\ < p < 1, < t < i (1.7) 

Equivalent ly, 

<f(p,t) = (pe l + 1 -p)exp\--p(t 2 + 2t)\ < 1, <p < 1, < t < y. 

We have ^ 

^(p,t)= ¥ ,(p,t)(_ F ^ Lp(t + 1)). 

Vpe 1 + 1 — p J 

Since for all p > <^(p, 0) = 1 we need </4(p, 0) = p{l — L) < 0, or L > 1, which holds if 
L > 1.12. It is easy to see that tp' t (p,t) = in at most one point t. In combination with 
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VtiPi 0) < it implies that for a fixed p maximum of ip(p, t) is attained at t = or t = 1/L. 
Therefore, we have to show tp(p, 1/L) < 1, < p < 1. We have, 



i 



, . 1 . , 1 . / eL — 1 L / 1 1 
»'I»^ (? 'L ) U + 1 - p -2fe + 2 Z 

Since <^(0, |) = 1 we should have yl(0, ^) < which would also imply f' p (p, j) < 0, p > 0. 
One can check that 

4(0,1) = e t -1-1(1 + 2) <0 
for L > 1.12. This finishes the proof of Lemma. 



2 One concentration inequality for empirical processes. 

Given Theorem 1 one can proceed as in [10, to obtain the classical form of concentration 
inequality for the empirical process around its mean. 

We will now show that in one special case which allows certain simplifications the tech- 
nique of Talagrand allows to obtain rather sharp concentration result with explicit constants. 
Consider the countable class of measurable functions T — {/ : Q — > [0, 1]}. Consider the 
following function on Q n 

Z(x) = sup VV/i/ - f(xi)) 

1 i<n 

where p,f := J fdfi. It often happens in applications (see [3], jl], 0), especially in the case 
when the empirical process is defined over the family of sets, that the uniform variance 
n sup f &:F Var/ is simply bounded by uniform second moment 

a 2 = nsup fif 2 (2.1) 

for which one has an apriori bound. Talagrand's technique gives in this case a proof of the 
following concentration inequalities. 

Theorem 2 Let L = 1.12 and M be a median of Z. Then 

P(Z > M + 2 max(Lw, aVZu)) < 2e _ ", 
F(Z < M - 2 max(Lu, cry/Zu)) < 2e~ u . 

Proof. Without loss of generality we assume that T is finite. Given a let us consider the set 
A = {Z(x) < a}. For a fixed x let / 6 T be such that 

Z(s)=5>/ -/(*,))• (2-2) 

i<n 
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Then for the probability measure v such that v{A) = 1 we have 

Z{x) - o < f (j>/ ~ /0*)) - J>/ - /(w))) <Mv) 

i<n i<n 

= Yl J (f(y*) ~ ffa^MvddKVi) < J f{yi) d i{yi)d^{yi). 

i<n i<n 

As is easily checked for v > 0, and < it < 1, 

w < u 2 + ip(v). 

Therefore, for any S > 1 

Z(x) -a<Sj2[ ^y-di(Vi)dfi(Vi) <]^ + S Yj 

i<n i<n 

Taking the infimum over v we get 

Z(x) < a + -a 2 + 5m(A, x). 
o 

Theorem 1 then implies that for L = 1.12 with probability at least 1 — P ^<a) e ~ U 

Z(x) < a + 2 max(L-u, <j\[~Lil). 

Applied to a — M - median of Z, and to a = M — 2 max(Lw, a\^Lu) gives the result. 

Remark. It is interesting to notice that the bounds of Theorem 2 seem to avoid the "sin- 
gular" behaviour of the general bounds expressed in terms of the weak variance n sup j ejr Var/ 
(see |6j), when the linear dependance of the term (1 + e)M on e reguires the factor of the 
order e~ l in the last term of the bound e^u. Under the assumptions of Theorem 2, one 
can also avoid this "singularity" using the recent result of Emmanuel Rio [H] , that provides 
rather sharp constants too, and the concentration is around mean instead of median. 

Acknowledgments. We want to thank Michel Talagrand for pointing out the recent 
results of Emmanuel Rio. 
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